Load the dataset and import necessary libraries

Here I use two different datasets in this experimentation: Diabetes dataset and California housing dataset from sklearn toy datasets and real world datasets respectively.

Convert the dataset to a pandas dataframe, check statistics and data preprocessing

Exploratory data analysis and feature engineering

We should choose the variables that are highly correlated with the target variable. But always check for multi-co-linearity while choosing the attributes. If two features are highly correlated with the target and those two are correlated with each other too then choose one of those, not both. In this dataset two features are highly correlated (>0.5) with the target: "bmi" and "s5". Correlation between "bmi" and "s5" is 0.45(<0.5) so I consider each of the two variables.

Load and preprocess california housing dataset

EDA and feature engineering

In this dataset one feature is highly correlated (>0.5) with the target: "MedInc". Correlation between "MedInc" and "Y" is 0.69 (>0.5) so I consider this variable. Other variables are not strongly correlated with the target (<0.2). For ease in experimentation with Gradient Descent I decided not to use other variables since those are not highly correlated with the target and number of instances are high, so it's better for GD if the number of dimensions are less.